-
Notifications
You must be signed in to change notification settings - Fork 26
Abstract reader/writer design #80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Windows 3.11/3.12 CI test failure:
FAIL: testSimple (__main__.H5pyWriterTest.testSimple)
----------------------------------------------------------------------
Traceback (most recent call last):
File "D:\a\hdf5-json\hdf5-json\test\unit\h5py_writer_test.py", line 155, in testSimple
self.assertEqual(len(g1.attrs), 1)
AssertionError: 2 != 1
# Copyright by The HDF Group. # | ||
# All rights reserved. # | ||
# # | ||
# This file is part of H5Serv (HDF5 REST Server) Service, Libraries and # |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
H5Serv
-> HSDS
(or something else, if we want the reader/writer to be independent)
pass | ||
|
||
|
||
class H5NullReader(H5Reader): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the motivation to have this as a default reader instead of something like filesystem reading? If no reader is specified and all the reads are no-ops, I imagine the user would want to be notified about that as an error, not have all the operations seem to go well but not do anything - assuming I understand this correctly
# Copyright by The HDF Group. # | ||
# All rights reserved. # | ||
# # | ||
# This file is part of H5Serv (HDF5 REST Server) Service, Libraries and # |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
H5Serv
-> HSDS
again - it looks like this occurs in a lot of files and can be fixed with a quick find and replace.
return self._reader | ||
|
||
@reader.setter | ||
def reader(self, value: H5Reader): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be preferable to distinguish the two reader()
functions as get_reader()
and set_reader()
, rather than leaving it up to the arguments. Same for writer()
.
@property | ||
def root_id(self): | ||
""" return root uuid """ | ||
return self._root_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is one hdf5db instance associated directly with a single top-level file? This seems a little unusual, but perhaps it makes sense since each file may need a unique reader/writer. Perhaps 'hdf5db' as a name for this object doesn't communicate it's exact purpose. Would something like "hdf5IOmanager" make more sense?
# storing db in the file itself, so we can link to the object directly | ||
col[id] = obj.ref # save attribute ref to object | ||
type_json = ctype_json["type"].copy() | ||
type_json["id"] = ctype_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this step of setting the id
necessary? It seems like the id
field in the stored type json should reflect the constructed ctype id, since that id is what we used to find this json in the first place.
# assume attr_type is a uuid of a named datatype | ||
is_committed_type = True | ||
# TBD: if dtype is a committed ref type, fetch it first | ||
# TBD: also, check special case for complex types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these TBDs still need to be addressed?
raise KeyError(f"ctype: {ctype_id} not found") | ||
ctype_json = self.getObjectById(ctype_id) | ||
type_json = ctype_json["type"].copy() | ||
type_json["id"] = ctype_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question as above about the necessity of setting this field.
raise IOError(errno.EINVAL, msg) | ||
if "fillValue" in cpl: | ||
fillValue = cpl["fillValue"] | ||
# TBD: fix for compound types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this TBD still need to be addressed?
stop = start + sel_inter.count[dim] | ||
slices.append(slice(start, stop, 1)) | ||
slices = tuple(slices) | ||
# TBD: needs updating to work in the general case! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has this TBD been addressed?
raise IOError(errno.EINVAL, msg) | ||
return link_json | ||
|
||
def _addLink(self, grp_id, name, link_json): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If _addLink
is meant to be an internal function, should it be moved to utils?
|
||
for obj_id in self.db: | ||
# skip deleted objects | ||
if self.db[obj_id] is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it will still count objects that have been marked as deleted with "DELETED"
and marked dirty, since e.g. deleteLink
doesn't appear to remove the link from the database.
try: | ||
linkObj = parent.get(link_name, None, False, True) | ||
linkClass = linkObj.__class__.__name__ | ||
except TypeError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there no way to distinguish a UD link from other potential causes of TypeError here and in _getLink
?
self._flush_time = 0.0 | ||
self._f = None # h5py file handle | ||
|
||
def _copy_element(self, val, src_dt, tgt_dt, fout=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the _copy_*
routines here and in h5py_reader should be renamed to indicate that they're also performing a conversion, and the direction of that conversion. Something like _element_json_to_h5py_rep()/_element_h5py_to_json_rep()
would prevent confusion between the two identically named functions that perform opposite operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same goes for _copy_array
.
for title in titles: | ||
link_json = links_json[title] | ||
link_class = link_json["class"] | ||
if "DELETED" in link_json: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this also handles deletion, would it be more accurate to call _createObjects()
something like _syncObjects()
?
hdf5db now supports interaces for reader and/or writer objects.
Currently json and hdf5 (using h5py) are supported.